Parallel Rule Mining with Dynamic Data Distribution under Heterogeneous Cluster Environment

نویسنده

  • G. Kesavaraj
چکیده

Big data mining methods supports knowledge discovery on high scalable, high volume and high velocity data elements. The cloud computing environment provides computational and storage resources for the big data mining process. Hadoop is a widely used parallel and distributed computing platform for big data analysis and manages the homogeneous and heterogeneous computing models. The MapReduce framework is applied to divide and process the data and tasks as small elements. The frequent item set mining methods are applied to fetch frequent patterns from the database transactions. The parallel frequent mining techniques divide and process the data set with equal intervals. The Data Partitioning in Frequent Itemset Mining on Hadoop Clusters (FiDoop-DP) is adapted to perform the load balanced rule mining process. The Voronoi diagram based data partitioning scheme uses the transaction relationships. The partitioning process controls the redundant transactions with similarity metric and Locality Sensitive Hashing (LSH) technique. The Parallel Frequent Pattern Growth algorithm is employed to discover the frequent item sets. The parallel rule mining process is build to support dynamic data partitioning and distribution over the heterogeneous Hadoop clusters. The heterogeneous Hadoop clusters are formed with different resource level in each computational node. The data aware partitioning process is carried out with load balancing mechanism. The computational resource level is also used for the data partitioning process. The FiDoop-DP scheme is upgraded to handle the data placement with load balance under the Hadoop Distributed File System (HDFS) in heterogeneous nodes. The parallel frequent Item set mining process is improved with energy efficiency features.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic Load Balancing for Parallel Association Rule Mining on Heterogenous PC Cluster Systems

The dynamic load balancing strategies for parallel association rule mining are proposed under heterogeneous PC cluster environment. PC cluster is recently regarded as one of the most promising platforms for heavy data intensive applications, such as decision support query processing and data mining. The development period of PC hardware is becoming extremely short, which results in heterogeneou...

متن کامل

Parallel Association Rule Mining on Heterogeneous System

Association Rule Mining from transaction–oriented databases is one of the important process that finds relation between items and plays important role in decision making. Parallel algorithms are required because of large size of the database to be mined. Most of the algorithms designed were for homogeneous system uses static load balancing technique which is far from reality. A parallel algorit...

متن کامل

Association rule mining and load balancing strategy in grid systems

The parallel and distributed systems represent one of the important solutions proposed to ameliorate the performance of the sequential association rule mining algorithms. However, parallelization and distribution process is not trivial and still facing many problems of synchronization, communication, and workload balancing. Our study is limited to the workload balancing problem. In this paper, ...

متن کامل

Design and Analysis of a Dynamic Load Balancing Strategy for Large-Scale Distributed Association Rule Mining

Association rule mining is one of the most important data mining techniques. Algorithms of this technique search a large space, considering numerous different alternatives and scanning the data repeatedly. Parallelism seems to be the natural solution in order to be able to work with industrial-sized databases. Large-scale computing systems, such as Grid computing environments, are recently rega...

متن کامل

Applying Parallel Association Rule Mining to Heterogeneous Environment

The work aims to discover frequent patterns by generating the candidates and frame the association rules after which filter out only the efficient rules based on various Rule Interestingness measures. As all these require heavy computation, application of complete parallelization to every individual phase would yield better performance. The paper illustrates the system behavior in a heterogeneo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017